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Abstract 

This NASA Technical Memorandum describes an optimal ensemble canonical correlation 
forecasting model for seasonal precipitation. Each individual forecast is based on the canon- 
ical correlation analysis (CCA) in the spectral spaces whose bases are empirical orthogonal 
functions (EOF). The optimal weights in the ensemble forecasting crucially depend on the 
mean square error of each individual forecast. An estimate of the mean square error of a CCA 
prediction is made also using the spectral method. The error is decomposed onto EOFs of the 
predictand and decreases linearly according to the correlation between the predictor and pre- 
dictand. Since new CCA scheme is derived for continuous fields of predictor and predictand, 
an area-factor is automatically included. Thus our model is an improvement of the spectral 
CCA scheme of Barnett and Preisendorfer (1987). The improvements include (i) the use of 
area- factor, (ii) the estimation of prediction error, and (iii) the optimal ensemble of multiple 
forecasts. The new CCA model is applied to the seasonal forecasting of the United States 
(US) precipitation field. The predictor is the sea surface temperature (SST). The US Cli- 
mate Prediction Center’s reconstructed SST (1951-2000) is used as the predictor’s historical 
data. The US National Center for Environmental Prediction’s optimally interpolated pre- 
cipitation (1951-2000) is used as the predictand’s historical data. Our forecast experiments 
show that the new ensemble canonical correlation scheme renders a reasonable forecasting 
skill. For example, when using September- October-November SST to predict the next sea- 
son December- January-February precipitation , the spatial pattern correlation between the 
observed and predicted are positive in 46 years among the 50 years of experiments. The 
positive correlations are close to or greater than 0.4 in 29 years, which indicates excellent 
performance of the forecasting model. The forecasting skill cam be further enhanced when 
several predictors are used. 
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1 INTRODUCTION 


The great economic significance of long-term forecasting, ranging from monthly to annual, 
provides enormous opportunities for the forecasting research. However, long-term precipi- 
tation forec8ist is extremely difficult to reach satisfactory level of accuracy. Numerous fore- 
casting methods have been developed to improve the forecasting, yet few have systematic 
estimate of the forecasting product’s quality. Both statistical models and dynamical general 
circulation models (GCM) have been explored. Despite the great potential of the coupled 
GCM, its forecasting ability is still limited due to numerous unknown mechanisms, includ- 
ing land-surface processes and cloud structure. The chaotic nature of nonlinear dynamics 
requires both spatially and temporally dense observations to limit the computing drift when 
solving a stack of nonlinear differential equations in a GCM. It is not conclusive yet whether 
the statistical or dynamical model is the best for forecasting. At the US Climate Prediction 
Center (CPC), it wets stated in 1998 that “neither the dynamical nor the statistical models, 
as a group, perform significantly better than the other” (Barnston and He, 1998). CCA, 
OCN (optimal climate normals), and CA (constructive analogue) are the official statistical 
forecasting models used in CPC. These statistical results are compaured with the coupled 
GCM model forecasts. A subjective determination is made when announcing the NOAA’s 
official long term forecasting. 

This NASA Technical Memorandum proposes a systematic method for error estimate 
of statistical forecasting by using a spectral approach with empirical orthogonal functions 
(EOFs) basis (Shen et al., 1998). The paper also proposes an ensemble statistical forecasting 
scheme, which is an ensemble canonical correlation analysis. Our canonical correlation anal- 
ysis (CCA) is developed for two spatially continuous fields and is a slight improvement of the 
scheme of Barnett and Preisendorfer (1987), referred to as BP-CCA. Our error is assessed by 
the mean square error field and the estimate is applied to the continuous canonical correlation 
analysis, referred to as C-CCA. Since our error assessment is an a priori estimate, it provides 
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an estimate on the forecasting quality at the same time when the forecasting is made. Thus, 
the error assessment is different from the traditional cross validation approach, which is a 
postenori approach and assesses the error after the occurrence of the forecasting event. 

The error field has a spatial structure, large over the places of low forecasting skill and 
small over the places of high forecasting skill. Thus, the estimated error indicates the spatial 
inhomogeneity of the forecasting difficulty and assesses the quality of forecasting. Another use 
of the error is for optimal ensemble forecasting. It is common knowledge that when making 
optimization related to a stochastic process, the error information has to be known. Thus, in 
addition to the a priori assessment of the forecast quality, the error is crucial when optimally 
combining several forecasts. In a linear ensemble forecasting, the weight of a particular 
forecast in the ensemble is inversely proportional to this forecast’s mean square error. 

Our inclusion of area-factor, optimal ensemble of several forecasts, and error estimate 
constitute a new development of the statistical forecasting method, which has not only added 
more features to the BP-CCA but also improved the forecasting accuracy. 

The classical CCA algorithm requires an inversion of cross-covariance and auto-covariance 
matrices (see Appendix A). In the climate forecasting practice, all these matrices are not full 
rank because of the short history of climate observations. The BP-CCA scheme in the EOF 
space avoided the inversion of matrices of non- full rank. BP-CCA was a milestone work and 
made it possible to use CCA as an operational forecasting tool According to Kim and North 
(1998, 1999), CCA is an accurate method compared with other statistical methods, such as 
the methods of POP (principal oscillation pattern) (von Storch and Zwiers, 1999), Markov 
(Xue, 2000), EOF extrapolation (Kim and North, 1998), and MCA (maximum covariance 
analysis) (Lau and Wu, 1999). CPC and Canadian Climate Center are using the BP-CCA for 
long term forecasting of temperature and precipitation. Therefore, it is important that the 
method be improved in a timely fashion, if it is possible. Our inclusion of the area-factor can 
reduce the noise of the high latitude data and hence is expected to lead to better accuracy. 
Further, because of our error assessment formula, the forecasting quality at a given location 
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is assessed simultaneously as the forecast is issued. These error estimates are crucial to 
optimally combine the forecasts from different models and data to form an optimal ensemble 
forecast. 

US seasonal precipitation is a difficult predictand because of its large variance. However, 
the clear impact of El Nino Southern Oscillation (ENSO) on the US recitation indicates that 
SST does have potential predictability to the US precipitation. If so, our improved CCA, 
which is supposed to be more accurate than the existing statistical tools, should catch a good 
part of the precipitation signal. Our experiments validated the above assumption. 

In our forecasting experiments, the predictor is the sea surface temperature (SST). The 
US Chmate Prediction Center’s reconstructed SST (1951-2000) is used as the predictor’s his- 
torical data. The US National Center for Environmental Prediction’s optimally interpolated 
precipitation (1951-2000) is used as the predictand ’s historical data. Our forecaist experiments 
show that the new ensemble canonical correlation scheme renders a reasonable forecasting 
skill. For example, when using September- October-November (SON) SST to predict the 
January-February-March precipitation of the next year, the spatial pattern correlation be- 
tween the observed and predicted are positive in 46 years among the 50 years of experiments. 
The positive correlations are close to or greater than 0.4 in 29 years, which indicates excellent 
performance of the forecasting model. The forecasting skill can be further enhanced when 
careful selection of ocean b^^sins is made and more predictors are used. 

The context of this technical memorandum is arrainged as follows. Section 2 illustrates the 
mathematics for the C-CCA. The discretization of the C-CCA is described in Section 3, where 
the symmetry of the covariance matrix with an area-factor is emphasized. The prediction as a 
multi-variate regression of the canonical variables is presented in Section 4. Section 5 studies 
the prediction error estimation and Section 6 shows optimal procedures of combining several 
different forecasts. The characteristics of the US precipitation aie contained in Section 7. 
The US seasonal forecasting results are included in Section 8, and Section 9 gives summary 
and conclusions. 
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2 C-CCA: A CANONICAL CORRELATION ANALYSIS BE- 
TWEEN TWO CONTINUOUS CLIMATE FIELDS 

The CCA forecasting scheme is a regression between two canonical variables U{t) and V(t'), 
which are weighted integrals of two corresponding variables, predictor and predictand. Here t 
and t' = t + At signify time, and At is the lead time of forecast for the predictand. The higher 
the absolute value of the correlation between the two random variables, the more significant 
the regression, and consequently, the stronger linear relationship between the two random 
variables. To msJce the regression statistically significant, the absolute value of the correlation 
between the two canonical variables is maximized by choosing optimal weight functions. 

Let random variables X(x,f) and Y{y,t) represent the predictor and predictand fields, 
respectively. Here x is the position vector defined in the predictor domain fix, y is the 
position vector in the predictand domain fly, and t signifies time. The corresponding weight 
functions are u(x) for X and w(y) for Y, which are called canonical correlation patterns and 
are to be determined by an optimization. Two weighted integrals are defined by 

U{t) = [ dflx A(x,t)it(x) , (1) 

Jcix 

V{t)= f dfly y(y,t)u(y) . (2) 

J f^y 

The unit of the weight function u(x) is [X-unit] ^ [area] ^ and that of v(y) is [Y-unit] 
[area]“^. Thus, the canonical variables U{i) and V{t) are dimensionless. Since the theory 
to be developed is still a stationary theory, the processed data of X and Y are often the 
standardized and detrended anomalies. The area is standardized by R , where R is the 
radius of Earth. 

Weight functions often imply certain physical meanings oiU{t) and V(t). For instance, if 

Area of Hx- 

and if X is dimensionless, such as the standardized anomaly, then U{t) is the spatial average 
of the X field over the domain fix • If X is not standardized, a quantity, such as the average of 
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the spatial standard deviation, should be included in the denominator of the above formula, 
in order to madce U (t) dimensionless. 

The CCA scheme searches for the optimal weight functions u(x) and u(y) so that the 
correlation between U and V reaches extrema. The corresponding weight functions u(x) and 
v{y) are called the canonical correlation patterns. The extrema can be either positive or 
negative. As will be seen later, the canonical patterns will be determined by an eigenvalue 
problem. The orthonormal eigenvectors are unique up to a positive or negative sign. There- 
fore, one can select the eigenvectors so that the correlations are always non-negative, and the 
correlation’s extrema can be regarded as maxima. The details of the eigenvector selection is 
given later. 

The correlation between U and V is denoted by p. Hence, it is our intention to find the 
optimal weight functions u(x) and t>(y) such that 

p = max Corr({7, V). (3) 

Since the weight functions are to be determined, one can always adjust them so that U and 
V have unit variance. The above correlation optimization problem is then equivalent to a 
covariance optimization problem 

p = maxCov(C/,y) (4) 

under a constraint: 

Var(C/) = Vai{V) = 1. (5) 

This optimization problem can be solved by the method of Lagrange multiplier and the 
unknown functions u and v can be determined by integral equations. The Lagrange function 
can be written as 

J[u, v] = {UV) + C [(t/2) - l] + 7, [{K2) - l] . (6) 

Here, C and rj are Lagrange multipliers, which are dimensionless and have a simple relationship 
with p (cL, eqs. (12) and (13)), and (•) denotes ensemble average. 
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The Lagrange function can be written into another form 


J[u,u] = 


f dQ.x f Sxy(x,yMx)u{y) 

Jqx 

+C dQx ^<fnx'Sxx(x,x')w(x)u(x')^ - ij 

+v dfiy ^ dny'Eyy(y,y'MyMy')^ -ij , 


where 


are covariance functions and 


is the cross- covariance function between the X and Y fields. 
The optimization conditions are 


Exx(x,x') = (X(x,t)X(x',f)), 

(7) 

Syy(y.y') = (r(y,t)Y(y',t)}, 

(8) 

Sxv(x,y) = (X(x,t)r(y,t)} 

(9) 


and 




( 10 ) 

( 11 ) 


which lead to the following integral equations for u and v: 

f dQy Lxy(x, y)t;(y) + 2C / , dilx' Exx(x,x')u(x') = 0, 

f dnx Sxy(x,y)w(x) + 2»7 / dny'Eyy(y,y')v(y') = 0- 
Jcix ■’^y' 

In the above, the integrations in each equation leave a free independent variable. The first 

equation’s free independent variable is x in the predictor domain fix and the second equa- 

tion’s y in the predictand domain Qy. These two equations form an eigenvalue problem for 

the eigenfunction (u(x), v(y)) and eigenvalue 4(7?. This eigenvalue is related to the correlation 

p between U and V, It will be shown that the relationship is 

= p. (^2) 
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More specifically, 


C = ^ = -p/2- 


(13) 


Linear integral equations are usually solved by integral transforms or series expansions. 
Here, the series expansion with EOFs basis is used to solve the above integral equations 
(lO)-(ll). 

EOFs are defined for X-field and Y-field, respectively, and they are the eigenfunctions of 
auto-covariance functions, 

[ Sxx(x,x')Vn(x') = A^V'n(x), 

Jnx' 

[ dny' EyvCy, y')4>n{y') = A^0„(y). 

Jily' 

The EOFs are orthonormal and form a complete basis for the vector space of the field. 
Namely, 


/ dO,x — ^mn? 

JQx 


JQx 

oo 


X) V’n(x)V’n(x') = J(x - x'), 

71=1 


where Smn is the Kroneker delta that is equal to one when m = n and zero otherwise, and 
<5(x — x') is the Dirac delta defined as a generalized function. For any continuous function 
/(x) over fix ) it has the property that 


[ dUx S{x - x')/(x) = /(x'). 

J^X 

According to the above relations, the unit of X’s EOF is [area]“^/^, and the unit of is ([X 
unit]^ [area]). The EOFs for the predictand have the same orthogonality and completeness 
properties and similar units. 

The field and weight functions are expanded in terms of EOFs: 


Tl 

(14) 

771 

(15) 
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( 16 ) 


W W = ^ WnV’n(x)/ , 

n 

v(y) = Vm4>m{y)/ {1'^) 

m 

In the above, Xn{t),yn{t), Un, and Un(y) are dimensionless, but the unit of u(x) is {[X unit]“^ 
[area]"^) and the unit of v{y) is ([Y unit]“^ [area]"^). 

The cross-covariance function can also be expressed in terms of EOF s 

Sxy(x,y) = (18) 

m,Ti 

The weighted integrals U and V defined by (1) and (2) now become 


U = J2 ^nXn, V = J2 ^mYm. (19) 

n Tn 

The above summations are infinite series, indices m and n running from 1 to oo. If 
one truncates tti upto q and ti upto p, then an optimization problem of maxp in a finite- 
dimensional spectral space can be obtained. 

In the spectral space, the correlation between U and V and its constraints can be expressed 
as 


because 


p = max Exyv) i 

u'u = l, v'v = 1, 


{XmXn) = 


(YmYn) = S^n 


In the above, the cross-covariance matrix in the spectral space is 


( 20 ) 

( 21 ) 


Exy(m,n) = [(X„y„)], ( 22 ) 

and 

u' = ■ ' j : ^9)* 

The superscript of a matrix indicates the transpose of the matrbc. So, u'v is the dot 
product of two vectors. The dimensions of the cross-covariance matrix are the truncation 
order p of the EOF expansion of X by the truncation order g of Y . 
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In the spectral space, the eigenvalue problems (10) and (11) become 


Exy V = ~2Cu, (24) 

Eyxu = -2r?v. (25) 

Substituting one into the other, one obtains an eigenvalue problem of a symmetric matrix, 

^XY^YX^ = (26) 

or equivalently 

= Av. (27) 

In general, the eigenvalues of B'B are non-negative for any matrix B. Our eigenvalue A is 
specifically equal to the square of the correlation, i.e., A = The proof of this is simple. 
Multiplying (24) by u' and (25) by v', then multiplying the two resulting equations, and 
finally using (20) and (21), one can deduce that = A = In addition, one can have 

p = -2C = -2r?, Exy V = pu, Ey^u = pv. (28) 

The correlation between two fields reveals the phaise differences of the two fields. When 
positive anomalies of U and V appear at the same time, then the correlation is one. According 
to the definition of the correlation p, the value of p must be between --1 and -Hi. The 
correlation-eigenvalue relationship p^ = A has two solutions p = ±A. Since both u and —u 
are both eigenvectors of the eigenvalue problem (26), one can choose the sign of u so that 
p = {U{t)V{t)) > 0. Next section will show how to choose the correct sign. 

3 DISCRETE CCA WITH AREA-FACTORS 

The above continuous theory can only become computationally possible when the covariance 
functions ExXj^yy smd Sxy, or Exy are known. However, in almost all the climatological 
applications, they are unknowns. A discrete version of CCA must be considered. Here we 
derive the discrete CCA from the integral version described in the previous section, hence the 
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area- factor is automatically included. The inclusion of the area- factor is important when the 
data are defined over the uniform latitude-longitude grid boxes and when the domains Qx 
and/or Hy contain high latitude regions, since the data over the low latitude boxes represent 
larger areas and are, thus, weighted higher. The details of the covariance matrices and EOFs 
with area-factor are described in Shen et al. (1998). 

The exact eigenvalue problem is 

[ p(x, (29) 

Jn 

Here '0jfe(x) is the kth EOF (or mode) and Ajt is the variance (eigenvalue) of X{x) on the fcth 
mode (fc = 1,2, ‘ • •)’ approximate eigenvalues of the above continuum eigen-problem can 
be estimated by a discretization procedure given by 




where 


\Pij] = \\fAi 

is the covariance matrix with area-factor, and 


(30) 


(31) 



(32) 


are the eigenvectors with area-factor and Aj is the area associated with the grid x,. For 
uniform latitude-longitude grid boxes, one has 


Aj = cos<pjA0A4> 


(33) 


where <pj is the latitude of x, and the A6 and A4> are the zonal and meridional box dimensions 
respectively, which are measured in radians. The linear spatial unit (i.e., the length unit) is 
in the radius of earth: i? = 6376 km. 

With the above preparation, if the area of the region associated with Xj is denoted by Ai 
and that associated with y, by Bj, the covari^ce matrices with area-factor are 



( 34 ) 
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Wy — 




(35) 

(36) 


The discrete EOFs are and e^. They are related to the continuous EOFs by 

e^{i)=Tpn{^)y/Ai, e^{j) = <l>miyj)\/^j- (37) 

Here e^(i) is the ith element of the vector e^. The magnitude of is equal to one, since 
IZ (®n (^))^ = E (V'n(Xi)\/Ij)^ W f d£lx = 1, 

t=l t=l 

where Jx is the total number of grid points in domain fix- 

In practice, the discrete EOFs are computed from the eigenvalue problems of the covari- 
ance matrices with area factors. 


The correlation p defined by (3) has a discrete version 

p = Corr(X,l u^,Yb - Vs), 

(38) 

where the vectors are defined by 

X^(0 = [v/i;X(xi,t)] , 

(39) 

Ys(t) = [v/^y(yi,0] , 

(40) 

= [\/^u(Xi)j , 

(41) 

vfi = [\/^v(y.)] • 

(42) 

The maximization of the correlation p can then be realized by maximizing the 

covariance 

P = UA^xyVfi, 

(43) 

under the constraints 

^A^XX^A = 1, 

(44) 


(45) 
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In order to find the maximal correlation p by solving a single eigenvalue problem, both the 
fields and weight functions are expressed in terms of discrete EOF. The EOF expansions are 


Tl=l 

(46) 

771=1 

(47) 

UA » X^“nen/\/^. 

71=1 

(48) 

vs w y) Wme^/ 

m=l 

(49) 


The discussion about the truncation orders p and q is given in the appendix. 

The requirement 

. Kx I 

^ ^ ^Tn(f)TJx(f) = Smn 

Xx ^ t=l 

forces the lengths of the data streams for predictor and predicatand to be the same, i.e., 
= K . The covariance matrices between the data point i and the data point j are 


computed by 

(50) 

^ t=l 

= ( 51 ) 

(52) 

^ t=l 

(53) 


The cross-covariance matrix in the EOF space is 

— [(‘^nTin)]px9 * 

Its transpose is 

The maximization of the correlation p leads to the following eigenvalue problem for u: 

ExySyxu = (56) 
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Similarly, the eigenvalue problem for v can be derived as well: 




( 57 ) 


In practical computing, the cross-covariance matrix in the EOF space is calculated by 
time average 

(58) 

^ f=l 

where Xn{t) is given by one of the following tl^ee expressions 


Xnit) 


(*)^ (Xi , t) y/Ai 

i=l V-^n 


= X] -7=V'n(x«)-X’(Xi,<)i4i. (59) 

i=l \/^n 

Using what expression depends on the available types of EOFs. The quantity ym(0 can be 
computed in the similar way. 


4 MULTIVARIATE REGRESSION FOR ANOMALY PRE- 
DICTION 

Our prediction method is a linear regression based upon the data of the predictor and predic- 
tand. The predictand is a lineair combination of the predictor. The predictand and predictor 
can be defined either on points or regions, although they are often defined on grid points. 
Of course, the regression is meaningful only when there is an approximate linear relationship 
between the predictor and predictand. The regression is statistically significant only when 
the scatter diagram of the data are not too far away from the regression line (or the regres- 
sion plane in the case of multivariate). This condition is equivalent to that the correlation 
between the two fields is laurge. 

In the case of single- variate, the regression line 

y — bx + xo 
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is statistically significant, only when the data point (xi,yt) are close to the line. The slope is 
b = fXTyjax, where p is the sample correlation, and Cy are the sample standard deviations 
of the x-data and y-data, respectively. How close the points are to the regression line is 
measured by the sample correlation, p, computed from the data. The closer the absolute 
value of p is to one, the closer the points to a line, and consequently, the more significant the 
regression. For example, when having 15 data points, the 5% significance level requires the 
correlation not less than 0.514. 

Our climate prediction uses multivariate regression, which is a linear approximation to 
a nonlinear relationship. The correlation between the two fields is defined in the previous 
sections, and computed by the CCA algorithms. Our linear approximation is optimal in the 
sense of the maximization of the correlation between the two fields. 

Our linear prediction in the physical space can be written as 

= (60) 
j 

Here, the predictand is at time t' = t+ M, the predictor is at time t, the “slope” matrix B is 
to be found, and the “intercept” vector is set to be zero since both Y and X are anomalies. 
We will use the canonical variables and canonical correlation patterns, together with EOFs, 
to express the fields and then find the “slope” matrbc B. The correlation between the X-field 
and y-field is the key parameter in the regression. 

If the truncation order p > q, then the correlation between and is zero 

when k> q. The same is true for the case q > p- Thus, we always make EOF truncation order 
for X and Y be the same: p = q- The CCA eigenvalue problem has p non-zero eigenvalues. 
When s = 2p-A-tl>0 (where K is the length of the data stream), the first s eigenvalues 
are equal to one, which implies a perfect linear relationship that is certainly impossible. Thus, 
choosing a large p may result in over-fit. In this paper, we always choose p < {K - l)/2 and 
none of the eigenvalues is equal to one. 
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The p eigenvectors are mutually orthogonal. The canonical variables are 

= (61) 
n=tl 

yW(i') = ^ A: = 1,2, • • • ,p. (62) 

n=l 

Now, the sign of needs to be determined to guarantee the positivity of pjt- One can 
calculate p using 

^ t=l 

If is positive, then pk = 7 * and does not change sign. Otherwise, 7 ^ < 0, then 

Pj^ = — 7 ^ and is changed to — After this operation, the correlation at each mode 

(A:) is non-negative. 

It is worthwhile to note that the operations above are among the dimensionless variables. 
Because the matrices [un^]pxp and [vm^]pxp are orthogonal matrices, their inverses are equal 
to their transposes. Such an inverse operation for a matrix is physically meaningful only 
when the matrix is dimensionless. 

The inverse of the matrix equations (61) and (62) leads to 

X„(t) = f3«Wt^W(0. (63) 

fc = l 

K.(0 = E«^‘VW(t'). (64) 

fe=l 

The field X in the spectral space is expressed by the sum of the products of the canonical 
variable and eigenvector The eigenvector is thus called the canonical pattern 
in spectral space. Its physical space correspondence is given by formula (48). 

Please note that, for the purpose of one-tier prediction, the predict and timet' is larger 
than the predictor time t by At and, thus, the cross-covariance matrix T^xy in this section 
is a lagged cross-covariance. The lag time is t' — t = At. 

The vectors • • • , and ^ y{p)y are all in p-dimensional vec- 

tor space. There must exist a unique transform matrix which transform one to the other. 
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Let this transformation be written as 


The formulas 


imply that 


= + = 1,2, •••,?. 

(65) 

= pkSki, 

(66) 

(t/«)=0, {{u^’‘^Y) = i, 

(67) 

(yW) = 0, ((fW)^) = 1 

(68) 

= fc= 1,2, 

(69) 


The prediction is therefore made for t' = t At when the predictor at t is known: 


X{x,t)- > Xn{t)~ > > Ynit')- > Y{y,t'). 


The equation (69) is thus the main prediction equation. The predictand at t' is 

y(y, t') = t ft yf^Uy)- (70) 

n=l \k=l / 

This is the one-tier prediction scheme: knowing the predictor at time t and predicting 
the predictand at t' = f + At. The CCA scheme can also be applied to the so called two- 
tier forecasting: forecasting the predictor from time t to time = t + At by a method and 
predicting the predictand at time t' from the predictor also at time f' by another method. 
CCA can also be used as the latter method. The CCA scheme is exactly the same, except 
that the lagged cross covariance matrix should be replaced by cross-covariance matrix without 
time lag, i.e. At = 0. 


5 ESTIMATE OF THE PREDICTION ERROR 

This section estimates the mean square error (MSE) of prediction. The importance of this 
error estimate is three fold. First, it gives an instantaneous assessment of the forecasting 
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quality. The smaller the error, the more accurate the forecast. Second, the MSE is directly 
related to the correlation skill of the prediction which is often used to assess the quality of a 
prediction. Third, the MSE errors are needed to compute the optimal weights for combining 
the multiple forecasts to form an optimal ensemble forecast. 

The CCA scheme requires the fields to be stationary. The computing algorithm requires 
the fields to be ergodic, i.e. the ensemble mean can be replaced by time mean for a long 
time. Our estimated error, as an ensemble value, cannot reflect the fluctuation of the fields 
according to time. However, if the data used in the prediction are changed, then the error 
should reflect this change. One may regard an observational network as a filter. When a filter 
is changed and the filtered results should change accordingly. Thus, the time dependence of 
the error is on the observational data and the data processing method, not on the intrinsic 
fluctuations of the fields. Thus, for a given observational network and a given data- processing 
method, our forecasting error is time independent. 

The error field is 

«^(y) = ((^(y.O-^(y.*'))\ (7i) 


where V is the unknown true field to be predicted, and Y is the prediction that approximates 
y. The prediction skill of V is often measured by its correlation with the original field y, 
defined by 




(72) 


From above two equations one can derive 



(y2) g2 

V (y^) \ 



(73) 


Several special cases are interesting. Case 1 is the perfect prediction: € = 0 implies r = 1. 
Case 2 is the worst wrong prediction: = (y^) + (y^) implies r = 0. And case 3 is the 

random prediction: implies r = (l/2)y^(^^^)7(y^. A sub-case of this is — 0, 

which is a useless forecast. In other sub-cases, it is unlikely that {Y^) is very small, hence 
the correlation r is also very small. 
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( 74 ) 


The MSE can be estimated by using EOFs. The error can be written into two parts 

e^(y) = «^(y) + 4(y)> 

where 

^?(y)=E4^^(y) (75) 

m=l 

is MSE computed from the first q modes with 

4=((>^m(0-'f;n(t'))'). (76) 

and the residual e^(y) is the contribution of higher modes to the MSE: 

oo 

4(y)= E ^-<^m(y)- (77) 

m=<7+l 

Apparently, this residue error depends on the truncation order, which, in turn, depends on 
the resolution of data and the spatial scale of the field. The higher the data resolution, the 
larger the q value. In practice, one uses discrete data and obtains only a finite number of 
modes, say, q'. The prediction uses the first q modes among the q' > q modes. Then, the 
residual error may formally written as 

E ■^m</’m(y)- 

77l=g+l 

This q^ is usually chosen to be A", the length of the data stream. However, when the data 
streams are not sufficiently long, they cannot resolve the EOFs of very high order. Because 
the higher order eigenvalues are close to each other, the computed EOFs can be very much 
distorted from the true ones (See North's rule-of-thumb, North et al., 1982). Thus, the above 
estimate of residual error can be unreliable if the data streams are not sufficiently long. 

Usually, the q modes of the predictand should resolve at least 50% total variance. The 
residual error should be smaller than or comparable to the prediction error of the first q modes. 
With the omission of the residual error in the computation, the error e^(y, t') computed based 
on the first q modes can be regarded as the lower bound of the MSE and its double may be 
regarded as the upper bound of the MSE. 
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The average MSE of the error field is the integral of 



( 78 ) 


where A is the total area of the domain Qy. When using the first q modes, the average MSE 
error is 

m=l 

This MSB’s unit is still [Y unit]^, as one expected. 

The error for the mth mode is computed as following 




>~li 


k=l 


E 

k=l 


-«[/<«) v«) 


k=\ 


Jfc=l 


(80) 


The error is scaled down for higher EOF modes of the Y-field by its eigenvalues since 
A^ — ► 0 as m — > oo, but scaled up by the CCA eigenvalues because >oo. 

If the correlations are large, say, close to one, then tm ~ 0. If the predictability is low, 
then the correlation is small If the largest correlation is even less than 0.4, the factor 1 - pf 
is larger than 0.84, which is pretty large. If both the predictor and predictand are spatial 
white noise, then there is no correlation between the two fields, and p| = 0 and 


= A^ 


and 

T7l 

Namely, the prediction error is equal to the total variance of the field and the prediction skill 
is zero. 
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6 OPTIMAL ENSEMBLE OF MANY FORECASTS 


Different data sets of predictor and predictand may yield many forecasts, such as the forecasts 
of precipitation using SST, soil moisture, SLP, SLP, or SSX from different ocean basins, etc. 
An optimal weight needs to be assigned to each forecasting result at every grid point in order 
to form an optimal ensemble forecast. This is similar to the superensemble idea proposed by 
Krishnamurti et ed. (2000). 

This section will show the method to find the optimal weights and the resulting predic- 
tion error. The spectral method is again used in our computing. The use of the spectral 
method requires stationarity assumption. Because of the requirement, which may be true in 
a short period, the forecs^ting naturally deteriorates rapidly as the lead time of the prediction 
increases. 

The spectral representation of the predictand is 


Y{y,t) = '£^Yr,{t)4>n(y)- 

n=l 


(81) 


Similarly, the forecaster’s representation is 

Y(y,t) = \f^Yn{t)^niy)- ( 82 ) 

n=l 

Suppose that there are M forecasts. The ensemble forecast for the principal component Yn{t) 




(83) 


/l=l 


where Wn^\ to be determined, is the weight for model k and mode n. The weights satisfy a 
constraint 

K 

(84) 


h-i 

but they do not have to be all positive. 

The MSE forecasting error is a function of y and is written as 

■B^(y) = ((^(y.O - ■^(y.*))^ 
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= ( 


°° I ^ \ , — 

E Y-{t) - E v^«An(y) >. 


( 85 ) 


Ln=l \ h=l / 

The principal components of different modes aire independent. The errors Yn — Yn (n = 
1, 2,3, • • •) are assumed to be independent. Then, the error function becomes 


where 


E\y) = Y,El4>l{y), 


El = A^((^r„(t) - E 


) 


(86) 


(87) 


is the MSE error for forecasting the nth principal component Yn{t)> 

The forecasting errors are caused by noisy data or inaccuracy of the forecasting model. 
For each EOF coefficient, the forecasting errors for different prediction models are assumed 
uncorrelated. With this assumption, one then has the following 


= ( (E {Ynit) - ) 


where 


/i=l 


(€W)' = ((r„(^)-^;W(^))^ 


( 88 ) 


(89) 


Minimization of expressed as above under the constraint (84) leads to the optimal weights. 
The weight for a model is inversely proportional to the model’s forecasting error. Namely, 






Then the resulted MSE error for the mode n is 


(90) 


= (E !/(<?>)’) 


(91) 


That the errors from different prediction models are independent is an important assump- 
tion. The different models usually mean different predictors, but the same prediction scheme. 
Thus, to ensure the independence of the errors, one should choose the predictors which are 


21 



not strongly correlated. This is intuitively correct, since the information from the similar 
sources is redundant. 

A special case is when all the models have the same amount of errors, In this 

case, the optimal weights are all equal to \/H and the resulting error is 


2 = 


(92) 


This is the conventional error-formula of homogeneous statistics. 

The method of finding optimal weights here is slightly different from that of Krishnamurti 
et al. (2000), though the purposes of the two raiethods are the same: optimal combination of 
multiple forecasts. Krishnamurti et al. (2000) used the multivariate regression to determine 
the weights, and they did not require the sum of the weight to be one. Since their regression 
is for the best fit of the linear superensemble model to data, the forecasted expected values 
are approximately the same as the observed expected values and hence the constraint of the 
sum of all the weights equal to one is not needed. But in our case, the constraint is needed 
to guarantee that the forecasted expected values is approximately the same as the observed 
expected values. Otherwise, it may happen that the weights are too large, and, consequently, 
the forecasted expected value is out of the range of possible climate and hence the forecast 
is not valid. Despite the difference between the present method and that of Krishnamurti et 
al. (2000), the results are actually close. We applied the formula (91) to the Y-component 
of their Fig. 3(b), the formula (91) provides an RMSE 1.7, which is about the same as that 
of Krishnamurti et al. (2000) . 


7 CHARACTERISTICS OF THE US PRECIPITATION 

To predict the seasonal precipitation over the US, it is helpful to describe its basic characteris- 
tics. Higgins et al. (1996) made an atlas of the US monthly precipitation, which is a valuable 
reference for forecasting research. The US seasonal precipitation is influenced by many fac- 
tors, including sea surface temperature (SST), sea level pressure (SLP), soil moisture, wind 
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direction, wind speed, and relative humidity. The cold winter- air mass from Canada often 
trigger heavy snow fall along the US snow-belt. The warmer SST over the eastern Pacific 
often results in a positive precipitation anomaly along the coast of California, particularly 
during an El Nino event. The anomaly of the Atlantic SST is often the energy source of 
the Atlantic hurricane, which, when it lands in the east coast of the US, makes a significant 
contribution to a monthly precipitation of the eastern states, including North and South 
Carolinas, Virginia, and sometimes, Texas, Georgia, New York, and Maine. 

The US annual precipitation has two centers. A center of a larger area is around the 
central Gulf Coast states, including Louisiana, Mississippi, and Alabama. This area’s precip- 
itation almost uniformly distributed from January to December, with slightly lower amount 
in October and slightly higher amount in July and August. This area’s precipitation is 
mainly influenced by the low pressure system over the northern part of the Gulf of Mexico. 
The winter precipitation is affected by winter cyclones. The spring precipitation is shifted 
a little toward the west, over Arkansas and Oklahoma, where the Canadieui cold air often 
brings down the moist transported from the Gulf of Mexico and triggers severe convective 
storms. The P RE- STORM experiment of the summer of 1985 in Oklahoma was designed 
to measure this type of precipitation. The summer precipitation is also influenced by the 
weather systems of the Atlantic, such as large scale hurricanes. 

Another center of a smaller area is the northwest US, including only the Pacific Northwest 
coast states of Washington and Oregon. This region is small, but has the largest amount of 
precipitation comparing to other regions of the US. This area’s precipitation, different from 
that of the southeast center, is highly non-uniform firom January to December. It has a strong 
seasonal cycle, with maximum precipitation in winter, minimum in summer, and moderate 
in spring and fall. The winter precipitation is mainly caused by the confront of the northwest 
cold air jet from Alaska and the westerly Pacific jet. The latter carries much of water vapor. 
Many winter days have more than 5 mm/day precipitation rate and make the area the wettest 
region of the conterminous US. The westerly Pacific jet becomes weaker in summer and shifts 
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to higher latitude, and hence it has stronger influence to British Columbia than Washington. 
This shift dramatically reduces the supply of water vapor to the Washington and Oregon 
area and hence reduces the precipitation amount. 

Of course, interannual variations of SST also influence the US precipitation (Ropelewski 
and Halpert, 1986; and others). During the El Nino event, the precipitation of California 
has a strong positive winter anomaly, and that of the southeast US has a strong negative 
spring anomaly. As the El Nino signal is the strongest in December, the CCA should jdeld a 
high skill over California if the December’s SST is used to predict the January precipitation. 
The CCA should also yield a high skill over the southeast US when the November-December- 
January SST is used to predict the February-March-April precipitation. The cold Pacific SST 
anomaly, such as La Nina, results in a negative winter precipitation anomaly in the southwest 
US, while the cold events’ influence in other regions and seasons is pretty much scattered. 

With the above information, one can be certain that the US precipitation is correlated to 
the surface temperature (particularly, the SST) and pressure field. These properties provide 
the predictability and our optimal forecasting scheme is designed to reflect the predictability 
and to yield high quality prediction. Of course, our method can only reflect an linear ap>- 
proximation of the nonlinear relation between the SST and US precipitation. Although the 
linear approximation has limitations, it has already provided meaningful results. Barnston 
and Smith (1996) used the BP-CCA to predict the global precipitation using the global SST 
and found significant prediction skill over North America at least in the winter seasons. 

8 DATA SETS AND THE RESU LTS OF THE US PRECIP- 
ITATION FORECASTING 

8.1 Data set$_ ^ . 

Our predictor’s data set is the historical monthly SST reconstructed by Smith et al. (1996). 
The spatial coverage is from 45° S to 69°N. The spatial resolution is 2 x 2 . The temporal 
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coverage is from January 1951 to December 2000. The temporal resolution is a month. 
The data are available via ftp at: ftp://ftp.ncdc.noaa.gov/pub/data/ocean/rsst/. Our linear 
regression scheme cannot resolve the nonlinear process imbedded in small scales. Thus, the 
2° X 2° SST data are further averaged into a data set of 4rdegree latitude by 6-degree longitude. 
This average removes the noise of smaller scales. 

The predictand’s data set is based upon the latest version of the Xie-Arkin global land 
monthly precipitation data set as reported in Chen et al. (2001). The temporal coverage of 
the precipitation data is from January 1948 to December 2000, but only the part overlapped 
with the reconstructed SST period is used in our forecasting calculation, namely, January 
1951 - December 2000. The spatial resolution is 2.5° x 2.5°. The gridded data are resulted 
from the interpolation of over 17,000 stations collected in the Global Historical Climatology 
Network (GHCN) Version 2 and the Climate Anomaly Monitoring System (CAMS). The 
data are available via ftp at: ftp.ncep.noaa.gov/pub/precip/50-yr. 

The anomaly data are centered around to the 1961-1990 climatology. These anomaly data 
are further detrended and standardized before they go into the EOF computing. After the 
forecasting on the standardized data, the forecasting results (for dimensionless quantities) 
are converted back to the anomaly data with dimension. However, the conversion uses only 
the mean and standard deviation, and the trend is not included. The forecasting results are 
then compared with the observed detrended anomaly field to assess the forecasting accuracy. 

8.2 Forecasting results 

The 50 years (1951-2000) SST and precipitation data are used in our forecasting experiments. 
Hence 49 years data are used for computing EOFs for both SST and precipitation, and these 
data are used to predict the remaining year’s precipitation. Our scheme requires that the 
lengths of the data streams for both SST and precipitation be the same. The reconstructed 
SST data are only for the preiod of 1951-2000, the precipitation data before 1951, although 
available, are not used in our forecasting experiments. EOFs of the US precipitation are 
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presented for two seasons: DJF (winter, Fig. 1) and JJA (summer, Fig. 2). For both 
precipitation and SST fields we show only the first six EOF modes. The number of modes 
used for actual forecasting is ten or another number. 


It is well known that the characteristics of summer precipitation is very much different 
from that of winter. The EOFs clearly reflect the difference. For example, the DJF EOFl 



Figure 1: First six EOFs of US precipitation in DJF. The computation used the three-month 
mean precipitation data, which had been detrended and standardized. 


26 







Figure 2: Same as Fig. 1, except for the summer season JJA. 


demonstrates a strong north-south pattern, while the JJA EOFl supports a northwest- 
southeast pattern. The differences are also reflected in higher modes. 

The CCA patterns are linear combinations of the EOFs. The interesting part is that 
the first two winter CCA patterns can sometimes be highly mode-selective, preferring only 
a few from the six retained EOF modes. When performing the SON (SST field)- DJF 
(precipitation) CCA analysis, the first SST CCA pattern is mainly contributed by 0.75 EOFl 
and 0.61 EOF2. Since the first SST EOF is the El Nino mode, the first SST CCA pattern 
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looks like the El Nino mode plus some North Pacific oscillation (see Fig. 3). The contribution 
from other EOFs is negligible. The first precipitation CCA pattern is mainly contributed by 
0.68 EOFl, 0.38 EOF2, and 0.34 EOFS (Fig. 3). The contribution from other modes is small. 
The northwest positive is mainly contributed by EOFL 




Figure 3: First two canonical correlation patterns for SON SST and DJF US precipitation. 
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Figure 4: First two canonical correlation patterns for MAM SST and JJA US precipitation. 


With this pair of CCA patterns, the CCA correlation between the SST and precipitation 
field (i.e., p\) is 0.87, a pretty large value indicating that a good forecasting result is expected. 
However, the summer CCA patterns are more uniformly distributed among the ten retained 
EOFs and do not have apparent dominant modes. For example, the first precipitation CCA 
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pattern selects not only EOFl and EOF2 , but also EOF6, EOF7, EOF8 and EOF9 (Fig. 4). 
Comparing Fig. 4(a) and Fig. 3(a), one can see that DJF’s first canonical pattern is much 
more coherent than that of JJA. The same is true for the SST canonical patterns (see Figs. 
3(b) and 4(b)). This reflects the difficulty of predicting the summer precipitation. 

The 2000 DJF precipitation is predicted by the 1999 SON SST. The observed precipita- 
tion anomaly is shown in Fig. 5(a) and the predicted anomaly in Fig. 5(b). The predicted 
auid the observed anomalies agree in most of the southeast 2 irea. The spatial pattern corre- 
lation, defined by the formula (93) below, between the predicted and observed is 0.47. The 
corresponding pattern correlation is higher for El Nino or La Nina years, but the year 2000 
was neither an El Nino nor an La Nina year. 

The DJF precipitation forecasts are made for every year from 1951 to 2000 by using 
the previous season’s (i.e. SON) SST. The pattern correlation and the Heidke skill score are 
shown in Fig. 6. Heidke score accounts for only the correct atnomaly sign. Pattern correlation 
accounts for both sign and amplitude. Only the perfect forecast, with both correct sign and 
amplitude, has pattern correlation equal to one. Fig. 6 indicates that our forecasting results 
are reasonable. 

The spatial pattern correlation is the correlation between the forecasted (i^) and observed 
anomalies {R() for a given season, 



When the pattern correlation reaches 0.4, the forecasted field has a good similarity to the 
observed. Among our 50 years of forecasts, 29 years have the pattern correlation equal to 
or greater than 0.4. Our maximum pattern correlation is 0.78, which indicates an extremely 
good forecast (see Fig. 6(a)). 

The Heidke score (HS) is another commonly used index to indicate the prediction skill 
(van den Dool et al, 1997). The two-class HS is defined as 

HS = X 100, (94) 
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Figure 5: Observed and predicted precipitation field of DJF 2000. The prediction was made 
from the previous season SON SST. The time lag is hence one season. The unit is [mm]/[day]. 

where T is the total number of forecasts (i.e., the total number of grid points in our case), E 
is the expected value of correct random forecasts (i.e., E = T/2), and H is the actual number 
of correct forecasts. Since the two-class Heidke score is used, a forecast is defined as correct 
if the forecasted anomaly has the same sign as the observed anomaly. The maximal value of 
HS is equal to 100, when every forecast is correct. The HS score for random guessing is zero 
because H = E, hence the random forecasting model does not have a skill. The HS reaches 
its minimum value if every forecast has missed and the score is -100 x E!{T — E), which 
is equal to -100. Our forecasts show positive skill scores in 45 years in the total of 50 years. 
The maximum Heidke score is 76 (see Fig. 6(b)). 


31 





1 

0.8 

0.6 

0.4 

0.2 

0 

- 0.2 

- 0.4 

- 0.6 

- 0.8 

-1 


100 

80 

60 

40 

20 

0 

-20 

-40 

-60 

-80 


1955 1960 1965 1970 1975 1980 1985 1990 1995 2000 


Figure 6: Forecasting skills of using previous season SON SST to predict DJF precipitation: 
(a) pattern correlation (defined by eq* (93)) ^ and (b) Heidke score (defined by eq. (94)). The 
EOFs were computed using the data from 1951-1999. 


The summer precipitation skill is consistently lower than that for winter. Among the 
50 years of experiments, there are only 18 years whose the pattern correlations are equal 
to or greater than 0.4. The skill varies strongly with near-zero skill in many years, but the 
maximum skill can still be high. In our example, the maximum skill is 0.83. The pattern 
correlation for the 2000 prediction is very low and the foreceist is not good. We have chosen 
to show a forecast of a bit less-than- average skill. This is 1997. The training period of the 
predictor and predictand is the 1951-1996. The 1997’s JJA precipitation is predicted using 
the same year’s MAM SST. The observed JJA precipitation is shown in Fig. 7(a). The 
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predicted result is shown in Fig. 7(b). Comparing these two figures, one can see that the 
forecasting skills are mainly from the Northeast and Southwest. In almost 50% of the US, 
the forecast has a wrong sign. Both the pattern correlation (0.18) between the forecasted 
and observed and the Heidke skill score are low (almost zero) (the most right bar in Fig. 8). 
This is a more detailed indication of the difficulty to forecast the summer precipitation. 



Figure 7: The observed and the forecasted 1997 JJA precipitation. The predictor was the 
1997 MAM SST. 

Of course, for actual forecast one does not know the observed precipitation. The reliability 
can be assessed by the expected value of prediction errors as discussed in Section 5. Fig. 9 
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shows the prediction root mean square errors (RMSE) for two seasons: DJF and JJA. The 
predictor is the previous season’s SST. The errors measure the mean square difference between 
the actual and forecasted precipitation. In the region where the precipitation variance is large, 
the error is also larger. The variance is proportional to the precipitation climatology. The 
DJF error map has a Northwest-Southeast pattern (Fig. 9(a)) and the JJA error error map 
has a Northeast-Southwest pattern. These patterns are consistent with US precipitation 
climatology (Higgins et al., 1996). 




Figure 8: Forecasting skills of using MAM SST to predict JJA precipitation: (a) pattern 
correlation (defined by eq. (93)), and (b) Heidke score (defined by eq. (94)). The EOFs were 
computed using the data from 1951-1996. 

The RMSE error is the expected value. The regions of large errors are likely to have the 
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wrong prediction. But for an individual forecast, the region of a large error may still have a 
good forecast. 




Figure 9; Expected values of the forecasting error when predicting the (a) DJF precipitation 
using the previous season SON SST, and (b) JJA precipitation using the same year’s MAM 
SST. 

Optimal combination of two forecasts may enhance the forecasting skill, but one has to 
be careful when doing so. Since both weights are positive, the optimal combination is an 
interpolation process. This process prefers two good forecasts. Thus, one should combine the 
two forecasts of reasonably high Heike scores. Fig. 10(c) shows the optimal combination of 
the 2000 DJF precipitation forecasts from the previous season’s SST using the Pacific (Fig. 
10(a)) and Atlantic oceans (Fig. 10(b)), respectively. The pattern correlation and Heidke 
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Figure 10: a) Predicted 2000 DJF precipitation using the previous season SON Pacific SST, 
(b) Predicted 2000 DJF precipitation using the previous season SON Atlantic SST, and (c)the 
optimal ensemble forecast, which is the optimal combination of the CCA forecasting results 
from (a) and (b). The spatial pattern correlations between the predicted and the observed 
(Fig. 5a) ore 0.43 for (a), 0.04 for (b) and 0.44 for (c). 
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score for Pacific forec^ are 0.43 and 35, respectively. And those for Atlantic are 0.04 and 31. 
The optimally combined forecast is shown in Fig. 10(c). Its pattern correlation and Heidke 
scores become 0.47 and 40. 

Compared with the observed precipitation in Fig. 5(a), the combined forecast has a better 
skill and is more similar to the observed than the two individual forecasts. If both forecasts 
are very good, the optimally combined forecast can have a much higher skill. 

9 SUMMARY AND CONCLUSIONS 

This paper contains a new development of the CCA forecasting scheme. This further devel- 
opment includes three parts. One is the inclusion of the area factor, and this procedure is 
very important to reduce the noise from the data of higher latitudes. The second is the error 
calculation formula, which estimates the mean square error of the forecasts and hence gives a 
quantitative quality assessment of the forecasting, without using cross validation. The error’s 
spatial distribution reflects the physical interaction mechanisms between the predictor and 
predictand. The third is the optimal combination of the multiple forecasts. The third is 
possible only when the error estimates are made in the second. 

Our forecast experiments show that the new ensemble canonical correlation scheme ren- 
ders a reasonable forecasting skill. For example, when using September-October-November 
SST to predict the December- January-February precipitation of the next year, the spatial 
pattern correlation between the observed and predicted are positive in 46 years among the 
50 years of experiments. The positive correlations are equal to or greater than 0.4 in 29 
years, which indicates excellent performance of the forecasting model. The forecasting skill 
can be further enhanced when careful selection of ocean ba,sins is made and more predictors 
are used. 

With the results presented in this paper, we may conclude that (i) the new improved 
CCA scheme is an effective prediction tool, (ii) the error estimate of the new prediction tool 
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is crucial at finding the optimal ensemble forecasting, (iii) other predictors, such as sea level 
pressure and soil moisture, should be considered, (iv) the GCM simulation data may be used 
to find the CCA statistical structure, and (v) further research is needed to determine the 
optimal EOF truncation order. 

Of course, precipitation over some areas are not directly related to SST anomalies, but 
more related to SLP or other predictors. Zorita et al. (1992) also used CCA to study the 
precipitation predictability of Iberian precipitation from both SLP and SST data. They 
found that the dominant Iberian precipitation variability is caused by the westerly wind and 
hence SLP patterns, but not the regional or remote SST anomalies. The same can be true 
for some regions of the US precipitation. However, our new CCA model has much flexibility 
and can easily include several predictors. Using multiple predictor can be an effective way to 
increase the forecasting skill. Therefore, a careful tuning of the model and a careful selection 
of predictors, such as soil moisture and SLP, can enhance the skill of prediction. Thus, our 
next project is to include other predictors in our optimal ensemble forecasting model. 

The prediction skill can be further enhanced by using an extrapolation type of optimal 
combination of forecasts. Some of the forecasts with negative temporal correlation signs can 
also help with enhancing the skill of the optimal ensemble forecasting skill. Some of the 
weights of the forecasts can be negative. The weights are computed from a linear regression 
scheme similar to that of Krishnamurti et al. (2000). The assumption of independence of 
the forecasting errors of different models is no longer needed. This work is deferred to future 
studies. 
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APPENDIX A CONVENTIONAL CCA 


This appendix describes the CCA scheme without consideration of the area-factor. This 
is the scheme widely used in social sciences, finance, climatology, statistics, and other fields. 
Emphasized here is the symmetry of the matrix, whose eigenvalue is the maximal correla- 
tion between two fields. This symmetry guarantees that the eigenvalues are real numbers. 
However, in many meteorological literature, this symmetry has been ignored. 

A.l. Conventional CCA and an example 

The random vectors X and Y represent climate parameters at different points. Linear 
combinations of the elements of X and those of Y form two scalar random variables. The 
linear combinations can have various kinds of physical interpretations, depending on how 
a linear combination is made. For example, the following linear combination from uniform 
weights 


^ 1 


i=l 


is the arithmetic average of the field and can be simply written as a'X, where 


a' = (1/iV 1/N 1/^) (a row-vecter), X = (Xi X 2 (a column- vecter). 


The objective of CCA is to find vectors a and b such that 

p=Corr{U,V) 


(95) 


be maximized, where 

U = a'X, V = b'Y (96) 

are called canonical variables. 

If the correlation is large, say, 0.7, we may say that the parameters X and Y have 
high mutual predictability. Then, a multiple-variate regression procedure may be used for 
prediction. 

The CCA scheme is derived from 

maxCov(?7, Y), (97) 
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under the condition: 


Va.T(U) = Var(F) = 1. 


(98) 


To proceed with the computings the following covariance matrices are either given or com- 
puted from data: 

Sxx = Cov(X,X), Eyy = Cov(Y,Y), = Cov(X, Y), Eyx = E'xy* (99) 

The conventional CCA scheme requires that xx and Eyy be of full rank and can be 
diagonalized. However, because of the temporal length of the climatological data streams is 
often shorter than the spatial data points, the covariance matrices are often not full rank. 
Hence, the spectral method is used to overcome this problem, since the eigenvalue problem 
of the CCA in the EOF space is a problem of symmetric matrix. As a matter of fact, the 
CCA problem in the EOF space is symbolicadly the same as the SVD problem in the physical 
space. 

The CCA algorithm includes the following three steps. 

Step 1. Diagonalize matrices Hxx = and Eyy = PyAyiV> and find 

yi — 1/2 y — 1/2 yl/2 yil/2 

where Hxx defined as P^xA^Py for a real number m, and Eyy is similarly defined. 

Step 2. Solve the eigenvalue problem 

= p^e. (100) 

Step 3. Find the canonical variables. The linear combination vectors are 

a = E-'/V (101) 

b = (102) 

The canonical variables are 

U = a'X, V = b'Y, (103) 
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and their variances are equal to one. Their maximal correlation is pi, the first eigen- 
value. 


Example. Find the maximal correlation between the linear combinations of two two- 
dimensional vectors 

x = (xi,x2), Y = (n,y2), 


Eyy = 


^XY = 


(104) 


(105) 


(106) 


if the covariance matrices are as follows: 

r 

1.0 0.5 

Sxx = 

' 0.5 1.0 

1.0 0.3 
0.3 1.0 

0.6 0.4 
0.7 0.5 

The first step is to find eigenvalues and eigenvectors of Exx and Eyy , so that the needed 
matrices can be prepared. We have the following 

1.1154 -0.2989 

-0.2989 1.1154 

1.0989 -0.3297 

-0.3297 1.0989 

0.9659 0,2588 
0.2588 0.9659 

The second step is to find the CCA matrix and solve the CCA eigenvalue problem. 

0.2393 0.3216 


-- 1/2 

-•XX 


^YY ~ 


-. 1/2 

■^XX 


(107) 


(108) 


(109) 




0.3216 0.4347 


( 110 ) 


The eigenvalues and eigenvectors of this matrix are 

= 0.6731, A2 = P2 = 0.0009, 


( 111 ) 
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and 


[ei 62 ] = 


0,5955 0.8033 

0.8033 -0.5955 

The third step is to find the canonical variables. For the first eigen-mode, we have 

0.4241 1.0740 


( 112 ) 


[ai a 2 ] = 62 ]== 


[bi b2] = — SyyEyxS;fx^[ai a2] = 

Pi 


0.7180 -0.9043 

0.8016 0.6755 

0.4039 -0.9674 


(113) 


(114) 


The canonical variables are 


Ui = a'lX = 0.4241X1 -h O.7I8OX2, U 2 = a'sX = 1.0740Xi - 0.9043X2, (115) 

Vi = b'iY - O-SOieYi + 0.4039y2, V 2 = b^Y = 0.6755Yi ~ 0.9674F2- (Hb) 

The maximal correlation between the canonical variables Ui and Yx is pi = 0.8204. This 
result can be verified by using the above two equations and the given covariance matrices at 
the beginning of the example: 


Cov(^7i,Yi) = ([/iVi) 

= 0.4241 - 0.8016(Xiyi) + 0.4241 • 0.4039(Xiy2) + 

0.0.7180 • 0.0.8016(X2yi) -h 0.7180 • 0.4039(y2y2) 
= 0-8204. 


This can also be verified in matrix notation: 

. {UiVi) {U 1 V 2 ) 

[ai a2]'Sxy[bi b2] 


0.8204 0.0000 
0.0000 0.0295 


(117) 


{U2V1) {U2V2) 

It is also interesting to see how the random variable X are represented by the canonical 
variables. To do so, one has to find all the canonical variables, i.e., Ui and U 2 in the present 
case. The formula is 

X = E^^^(ei 62 )U, (118) 
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where U = (C/i 172)^ is the vector of canonical variables. The above can be written as 

X=t/iEi + (72E2, (119) 

where 

El = Exx®! = {0-7831 0.6218)', Ea = E^x®2 = (0.9301 - 0.3673)' (120) 

are called canonical patterns, as the physical field X is represented by the product between 
the patterns and the canonical variables. These two vectors are usually not orthogonal. 

Similarly, the random variable Y can be represented by the canonical variables and pat- 
terns: 

Y = ViFi + V 2 F 2 , (121) 

where 

Fi = (0.9228 0.3853)', F 2 = (0.6444 - 0.7647)'. (122) 
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APPENDIX B ANOTHER CCA SCHEME OF TWO CON- 
TINUOUS FIELDS 


This CCA algorithm retains the covariance functions in the physical space and is different 
from the one explained in Sections 2 and 3^ which is in the spectral space. When a climate 
field is not expanded into spectral spaces, the current method may be a useful alternative to 
the CCA algorithm described in Section 2. 

The weight functions are expanded in terms of EOFs: 

“(x) = Wn^n(x)/y'^, (123) 

n 

^(y) = E ^rn<f>m{y )/ \/^- (124) 

771 

The eigenfunctions are then determined by the coefficients Un and Vm, which are, in turn, 
determined by linear algebraic equations. 

With the above expansions, the equations of the eigenvalue problems (10) and (11) become 



(125) 

(126) 


Using the orthonormal condition of the EOFs, one can derive the following from equation 
(125) above: 

Um ~ ^ ^ ? (^27) 

n 

where 

Cmn= I dnx [ dnKExy(x,y)%P%^. (128) 

JUx J^Y K 

Substituting these two formulas into (126), one obtains the eigenvalue problem for the 
coefficient-unit vector (ui, U2r ’ 


oo 

^ MlnVn = \ VI- 
n=l 


(129) 


46 


where 


(130) 

(131) 


A = 4Ct7, 

OO 

^In — ^ ^ Oml^mn^ 
m=l 

The coefficient matrix [Min] is symmetric, hence the eigenvalues are real and the eigenvectors 
are orthogonal. Similarly, an eigenvalue problem for the unit vector (ui,U 2 ,--*) can be 
obtained. 

The eigenvalue A is directly related to the maximal correlation (p) between U and V. As 
a matter of fact, 

A = p2. (132) 


The proof is given below. 
The correlation is 


p =Corr(^7,y) 

= f (Klx [ Sxy(x,y)w(x)t;(y) 

J fix 


— ^ ^ CVnn’^m^n* 

m,n 

(133) 

Formula (127) leads to 

m 

— ~ qT y ] y V ^mn'^ni 

“^^771 n 

(134) 

or 


(135) 

since (wi, U2, • •) is a unit vector. 



Similarly, one can derive 

1 = 

2t) 

(136) 

The product of the formulas (135) and (136) yields 



11 

11 

(137) 
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The formulas (135) and (136) also imply that the two Lagrange multipliers are equal to each 
other, 

»? = C = -f- (138) 

This gives the physical meaning of the Lagrange multipliers, which is related to the correlation 
between the two fields under consideration. 
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APPENDIX C OPTIMAL EOF-TRUNCATION ORDER 


optimal truncation orders p and q should take values such that the observed data best 
support the physics. An expansion of a higher order contains too much noise, and that of a 
lower order leaves out some signal. Prom computational aspect, the first H CCA eigenvalues 
p^{Ji z=i 1, • ' ' , H) are equal to one, where H — p-\-q — K-\- \ and K is the temporal length 
of the data streams. According to the CCA prediction error, the forecasting of the first h 
modes have zero error, which is, of course, impossible. Thus, p and q cannot be too large. If 
p = q, then the upper bound is p < (K - l)/2. If there are 20 years of data, then p < 10. To 
give some insight about the truncation, we include some further analysis below. 

When EOFs form a complete basis for a linear space, then the corresponding field can be 
exactly expressed by an infinite EOF series 


n=l 


(139) 


Of course, this is for a continuous field. The completeness condition of the EOF basis is 


expressed by 

OC 

^ ipn{x)i>n{x') = 5{x - x'). (140) 

n=l 

With a discrete field, the climate is contained in a finite dimensional space. Suppose 
that there are Jx discrete points. Then the field X must be contained in a Jx-dimensional 
vector space, denoted by J. The EOF expression of the field should contain as little noise 
as possible. With true EOFs, the climate signal may be contained in a sub-space spanned 
by Nc dimensional sub-space, denoted by 5c, where Nc < Jx- Hence, = 5c + 5e, where 
5g contains no climate signal. The EOFs computed from data are the approximate EOFs. 
The first a few computed EOFs may agree with the true EOFs and contain climate signals. 
The higher order EOFs, whose eigenvalues are close to each other, are far away from the true 
EOFs, according to the North’s rule-of-thumb (North et al., 1982). Therefore, the sub-space 
spanned by the computed EOFs intersects partly with 5c and partly with 5«. The climate 
signal sub-space is denoted by 5^ (sample signal represented by the first p-EOFs) and 5n 
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(sample noise represented by higher order EOFs), Only the EOFs in the signal sub-space are 
used in the forecasting analysis. Some properties of Ss are described below. 

Observed data value at only the station points or grid points. If the EOFs can form a 
complete basis in a vector space Jx ? which is the total number of grid points for parameter 
X, then the field can still be expressed by the EOFs: 


Jx 




n=l 


(141) 


Here, the covariance matrix is defined by 

{X{i,t)X{j,t)) = jf^X{i,t)X{j,t), (142) 


where K is the temporal length of the data stream. The vector €n{i) is the nth eigenvector 


of this covariance matrix, hence is the nth EOF. When K > Jx: the covariance matrix is of 


full rank, and the EOFs form a complete space. The complete condition is 

Jx 

Y, en(i)en(i) = 5ij. (143) 

n=l 


It is also interesting to know that 

1 

— Y Xn{s)Xn{t) = Sst. (144) 

n=l 


and 

-^^X„(t)X„(t)=W (145) 

^ t=i 

WTien K < Jx^ the rank of the covariance matrix is at most K. The covariance matrix 
has at most K non-zero eigenvalues, and hence K definite eigenvectors. These eigenvectors 
cannot form a complete basis in the Jx -dimensional vector space. Hence the EOF expansion 
of the field is equal to the field only in this K years, but they are not equal to each other in 
other years. The equality that holds in the ii"-data years can be proved from the calculation 
algorithm of the EOFs. 

The EOFs are computed from 

Y f-^E^(i,t)^(j,i)) e„(j) = A„e„(t). (146) 

j=l t=l ) 
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The left hand side of the above may be written as 


t=l ^ j=l t=l ^ 


Combining the above two equations, one obtains 




The principal components satisfy that 

lj2Xn{s)X^{t)=5^ 

n=l 


and 


1 ^ 

^mn 


Thus, the inverse matrix of 

^ [Xn(i)] 


(147) 

(148) 

(149) 

(150) 


IS 

[^n(t)]'. 

With this inversion, eq. (148) becomes 

^(i,t) = E\/^^n(t)e„(»)- (151) 

n=l 

According to North’s rule-of-thumb, when two sample-eigenvalues are close to each other, 
the sampling error for EOFs is very large and the mixture of two or more true EOFs into a 
sample EOF occurs. The EOFs are defined as the best approximation of the field by a linear 
combination of EOFs. The best approximation is measured by the ensemble average. That 


IS, 


(X; X{j, t) - E )■ 

j = l V 71=1 / 

When this is approximated by K years of data, we have the following problems: 


(152) 


(a) The approximation most likely has a large error beyond the K years. 
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(b) The mbcture of two or more modes destroys the best approximation. Suppose that the 
sample EOF C4(j) is a mixture of ^4(j) and Then, there exist two constants 04 

and 05 such that 

640) = 04^4 (j) + 05^5 (j)- 

The projection of the field onto 64 becomes the projections onto t/>4 and ips- Because 
of U4 and 05, the ^4 and ^5 projections are not orthogonal and hence not the best 
approximation. North’s rule-of-thumb indicates that the first a few sample EOFs do 
not likely have mixture and hence should be used as a good approximation to true 
EOFs. This is equivalent to say that one takes only the EOFs whose eigenvalues are 
sufficiently separated. These are usually the first a few sample EOFs. The higher order 
EOFs contain the smaJler scales which require more than K years of data to resolve. 
This is a type of aliasing phenomenon. Large sampling size resolves smaller spatial 
scales. 

Unfortunately, there is no definite rule to determine the optimal truncation order p. The 
C-rule is used in this paper. For a given constant C, if 

— ( 153 ) 

then the nth EOF is included. The optimal value p is equal to the maximal n. In this paper 
we choose C = 3 for both SST and precipitation. Examples of the separation of egenvalues 
for SST and US precipitation are displayed in Fig. 11 . Thus, the SON global SST may use 11 
modes. However, there is a strong evidence of mode mixture for the SON global Pacific SST 
between the Vlth and 13 th modes. Thus it is preferable to discard the modes whose order 
is higher than 12 . The DJF precipitation’s mode mixture happens at mode 11 . So, modes 
1-10 may be used for forecasting. However, our CCA algorithm requires the same number of 
modes for both predictor and predict and. Hence we have chosen 10 modes from both SST 
and precipitation EOFs. 
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